Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations
نویسندگان
چکیده
In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent an important component of multimedia database systems. Since it is very inefficient to construct an index for a large amount of data by dynamic insertion of single objects, there is an increasing interest in bulk-loading techniques. In contrast to previous approaches, our technique exploits a priori knowledge of the complete data set to improve both construction time and query performance. Our algorithm operates in a mannar similar to the Quicksort algorithm and has an average runtime complexity of O(n log n). We additionally improve the query performance by optimizing the shape of the bounding boxes, by completely avoiding overlap, and by clustering the pages on disk. As we analytically show, the split strategy typically used in dynamic index structures, splitting the data space at the 50%-quantile, results in a bad query performance in high-dimensional spaces. Therefore, we use a sophisticated unbalanced split strategy, which leads to a much better space partitioning. An exhaustive experimental evaluation shows that our technique clearly outperforms both classic index construction and competitive bulk loading techniques. In comparison with dynamic index construction we achieve a speed-up factor of up to 588 for the construction time. The constructed index causes up to 16.88 times fewer page accesses and is up to 198 times faster (real time) in query processing.
منابع مشابه
Improving the Query Performance of High-Dimensional Index Structures Using Bulk-Load Operations
In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent an important component of multimedia database systems. Since it is very inefficient to construct an index for a large amount of data by dynamic insertion of single objects, there is an increasing interest in bulk-loading techniques. In contrast to previous approaches, our technique exploits a pri...
متن کاملیک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملA Spatial Grid File for Multimedia Data Representation
In multimedia databases spatial or high-dimensional data manipulation is important for storage and retrieval. In this study, we introduce a new file structure called Spatial Grid File. This file enables us to index data objects by different and independent high-dimensional attributes. And, with it, well-known spatial query types, such as range queries, nearest neighbor queries and spatial join ...
متن کاملWorcester Polytechnic Institute
A data warehouse typically differs from an OLTP database in terms of both significantly larger sizes for data pages as well as in the volume of data inserted in bulk. The traditional B+ Tree and its variants, while still a popular candidate for supporting point and range queries, can become very memory intensive for insert and delete operations under these more stringent requirements. Since typ...
متن کاملA Generic Approach to Bulk Loading Multidimensional Index Structures
Recently there has been an increasing interest in supporting bulk operations on multidimensional index structures. Bulk loading refers to the process of creating an initial index structure for a presumably very large data set. In this paper, we present a generic algorithm for bulk loading which is applicable to a broad class of index structures. Our approach differs completely from previous one...
متن کامل